Introduction

Biomedical database evaluation can identify trends in clinical research, as well as emerging therapeutic areas, directly influencing basic research funding, clinical candidate selection, trial designs, and treatment protocols. Traditionally, these evaluations require substantial manual effort and involve working groups. We are examining ways to support these activities using advanced artificial intelligence (AI) and machine learning (ML) methods. Methods

We used a systematic approach to identify scientific publications related to hematological neoplastic diseases (HND). We extracted terms and definitions from the WHO classification of tumors of hematopoietic and lymphoid tissues to create a basic terminology dataset. We expanded terminology with data from OncoTree and MONDO ontology to develop a comprehensive vocabulary. In addition, we included non-standard names, terms, and abbreviations representing HND based on expert input. We then performed regular expression-based searches to find abstracts published in PubMed (Jan 1950 – Apr 2025; 38,644,614 abstracts) that mention HND. We furthermore used advanced BERT-based large language models for annotation and categorization. This process was repeated multiple times with manual expert oversight to reduce false positives and negatives. The annotation results were combined with the iCite database.

Translational potential was assessed using a composite score that integrates clinical relevance and research impact from iCite metadata.

The Clinical Relevance Score (CRS) includes classification of human studies (40% weight), designation of clinical studies (30%), and clinical citations (30%). The Research Impact Score (RIS) combines log-transformed citation counts (50%), the relative citation ratio (30%), and NIH percentile (20%). The overall Translational Potential Score (TPS) was calculated as: CRS × 0.6 + RIS × 0.4, emphasizing clinical applicability. Scores were determined for individual publications and then averaged within WHO subcategories using arithmetic means. We reported scores as scaled composite indices rather than normalized values for ease of computation. Weighting scores were chosen based on expert opinion. We furthermore examined temporal trends in research potential by comparing the overall cohort with trends over the past 10 years. Results

Analysis of 632,471 (1.6%) publications demonstrated that HND research is dominated by three major disease categories: leukemia (39.8%), lymphoma (31.2%), and myeloma (10.3%), representing over 80.0% of the literature. The remaining research focused on myelodysplastic syndromes (3.6%), myeloproliferative neoplasms including myelofibrosis and polycythemia vera (2.2%), histiocytic disorders (3.1%), lymphoproliferative disorders (2.1%), and mature T-cell neoplasms such as mycosis fungoides (1.2%). Historically, among 11 major subcategories with sufficient data representation, Hodgkin Lymphomas show the highest TPS (score: 710,219), impacted by high clinical relevance (80.2% human studies) and citation impact. Myelodysplastic syndromes ranked second (score: 667,226) with 74.2% human studies, followed by myeloproliferative neoplasms (score: 599,926; 71.3% human studies). The analysis revealed a clear translational hierarchy, with lymphoid malignancies (Hodgkin lymphomas, mature T/NK-cell neoplasms) and myeloid disorders (myelodysplastic syndromes, myeloproliferative neoplasms) showing higher TPS compared to rare entities like dendritic cell neoplasms (score: 171,291).

Temporal analysis within the last 10 years showed that myeloid neoplasms have the highest recent TPS (score: 712,343), with myelodysplastic syndromes leading all subcategories (832,439). All major categories showed increasing TPS (+16.4% to +22.1%) except histiocytic disorders (-25.8%). Research activity expanded across growing categories (+71.9% to +96.7% paper count increases), indicating a shift toward myeloid disorder translational research in hematologic oncology.Conclusions

Our analysis of publications focusing on hematological neoplasms reveals a paradigm shift in translational research activities. While leukemia, lymphoma, and myeloma historically dominated, myelodysplastic syndromes exhibit the highest translational potential. To fully utilize the potential of the proposed methodology, a robust and validated framework for research prioritization needs to be created, which requires broader input from the medical community.

This content is only available as a PDF.
Sign in via your Institution